Skip to content

feat(modules): add ability to restart modules#1755

Open
paul-nechifor wants to merge 2 commits intodevfrom
paul/feat/restart-modules
Open

feat(modules): add ability to restart modules#1755
paul-nechifor wants to merge 2 commits intodevfrom
paul/feat/restart-modules

Conversation

@paul-nechifor
Copy link
Copy Markdown
Contributor

@paul-nechifor paul-nechifor commented Apr 6, 2026

Problem

We cannot restart modules.

Closes DIM-782

Solution

  • Can restart modules.
  • Does so on a new worker so that the code is imported again.

Breaking Changes

None

How to Test

●  uv run python
Python 3.12.11 (main, Mar 16 2026, 23:32:07) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from dimos.core.coordination.module_coordinator import ModuleCoordinator
>>> from dimos.robot.unitree.go2.connection import GO2Connection
>>> coordinator = ModuleCoordinator.build(GO2Connection.blueprint())
23:56:09.932[inf][dination/module_coordinator.py] Building the blueprint
23:56:09.938[inf][dination/module_coordinator.py] Starting the modules
23:56:09.964[inf][ation/worker_manager_python.py] Worker pool started. n_workers=2
mujoco_menagerie not found. Downloading...
Successfully downloaded mujoco_menagerie
23:56:42.448[inf][/coordination/python_worker.py] Deployed module. module=GO2Connection module_id=0 worker_id=0
23:56:42.449[inf][dination/module_coordinator.py] Transport module=GO2Connection name=pointcloud original_name=pointcloud topic=/pointcloud#sensor_msgs.PointCloud2 transport=LCMTransport type=dimos.msgs.sensor_msgs.PointCloud2.PointCloud2
23:56:42.450[inf][dination/module_coordinator.py] Transport module=GO2Connection name=color_image original_name=color_image topic=/color_image#sensor_msgs.Image transport=LCMTransport type=dimos.msgs.sensor_msgs.Image.Image
23:56:42.450[inf][dination/module_coordinator.py] Transport module=GO2Connection name=camera_info original_name=camera_info topic=/camera_info#sensor_msgs.CameraInfo transport=LCMTransport type=dimos.msgs.sensor_msgs.CameraInfo.CameraInfo
23:56:42.450[inf][dination/module_coordinator.py] Transport module=GO2Connection name=cmd_vel original_name=cmd_vel topic=/cmd_vel#geometry_msgs.Twist transport=LCMTransport type=dimos.msgs.geometry_msgs.Twist.Twist
23:56:42.450[inf][dination/module_coordinator.py] Transport module=GO2Connection name=odom original_name=odom topic=/odom#geometry_msgs.PoseStamped transport=LCMTransport type=dimos.msgs.geometry_msgs.PoseStamped.PoseStamped
23:56:42.450[inf][dination/module_coordinator.py] Transport module=GO2Connection name=lidar original_name=lidar topic=/lidar#sensor_msgs.PointCloud2 transport=LCMTransport type=dimos.msgs.sensor_msgs.PointCloud2.PointCloud2
23:56:54.846[inf][os/simulation/mujoco/policy.py] Loaded policy: /home/p/pro/dimensional/dimos/data/mujoco_sim/unitree_go1_policy.onnx with providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
23:56:54.883[inf][t/unitree/mujoco_connection.py] MuJoCo process started successfully
>>> coordinator.restart_module(GO2Connection)
23:57:22.507[inf][/coordination/python_worker.py] Deployed module. module=GO2Connection module_id=1 worker_id=2
23:57:34.693[inf][os/simulation/mujoco/policy.py] Loaded policy: /home/p/pro/dimensional/dimos/data/mujoco_sim/unitree_go1_policy.onnx with providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
23:57:34.736[inf][t/unitree/mujoco_connection.py] MuJoCo process started successfully
<dimos.core.rpc_client.RPCClient object at 0x78714cb6b6e0>

Contributor License Agreement

  • I have read and approved the CLA.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 6, 2026

Greptile Summary

This PR adds restart_module to ModuleCoordinator, enabling a deployed Python module to be stopped, optionally source-reloaded via importlib.reload, and redeployed on a fresh worker process while preserving stream transports and module references. Two defects in the implementation need attention before merge.

  • restart_module calls self._deployed_atoms[module_class] (line 365) without a guard; modules deployed through coordinator.deploy() are never added to _deployed_atoms, so the call raises KeyError despite passing the earlier guard.
  • Stream transport reconnection uses stream_ref.name (the original name) to look up _transport_registry, which is keyed by remapped names; any module with remapped streams will silently lose its transport after restart.

Confidence Score: 4/5

Not safe to merge as-is due to two P1 defects: a KeyError on modules deployed via deploy() and a silent transport-wiring failure when stream remappings are used.

Two P1 bugs are present in restart_module — an unguarded dict access that crashes for modules deployed outside the blueprint path, and a key mismatch in _transport_registry that silently disconnects remapped streams. Both are straightforward to fix but affect correctness of the primary feature. All other findings are P2 style issues.

dimos/core/coordination/module_coordinator.py lines 365 and 397-402

Important Files Changed

Filename Overview
dimos/core/coordination/module_coordinator.py Adds unload_module and restart_module; restart_module has a KeyError risk on modules deployed via deploy() and a silent transport-rewiring bug when stream remappings are used
dimos/core/coordination/worker_manager_python.py Adds deploy_fresh (spawns a new worker per restart) and undeploy (shuts down the worker if it becomes empty); logic is correct
dimos/core/coordination/python_worker.py Adds undeploy_module to PythonWorker; sends undeploy_module request to the worker process and removes the module from the local map
dimos/core/coordination/test_module_coordinator.py Adds 6 restart_module tests covering proxy replacement, stream wiring, ref rewiring, worker lifecycle, and importlib.reload invocation
dimos/core/coordination/test_module_reloading.py Integration test for live source-reload via subprocess REPL; missing @pytest.mark.slow marker may cause CI timeouts
dimos/core/coordination/_test_module.py New AliceModule test fixture used by the reload integration test
dimos/utils/importing.py New utility for importing a module from a file path; not referenced by any changed file in this PR
docs/usage/modules.md Adds clear documentation and usage example for restart_module

Sequence Diagram

sequenceDiagram
    participant User
    participant MC as ModuleCoordinator
    participant WMP as WorkerManagerPython
    participant OW as OldPythonWorker
    participant NW as NewPythonWorker

    User->>MC: restart_module(MyModule)
    MC->>MC: read _deployed_atoms[module_class]
    MC->>MC: capture inbound_refs, outbound_refs
    MC->>MC: unload_module(module_class)
    MC->>OW: proxy.stop()
    MC->>WMP: undeploy(proxy)
    WMP->>OW: undeploy_module(module_id)
    OW-->>WMP: ok
    WMP->>OW: shutdown() if empty
    OW-->>WMP: ok
    WMP->>WMP: remove worker from pool
    opt reload_source=True
        MC->>MC: importlib.reload(source_mod)
        MC->>MC: new_class = getattr(source_mod, name)
    end
    MC->>WMP: deploy_fresh(new_class, config, kwargs)
    WMP->>NW: start_process()
    WMP->>NW: deploy_module(new_class, ...)
    NW-->>WMP: actor
    WMP-->>MC: new_proxy (RPCClient)
    MC->>MC: reconnect streams via _transport_registry
    MC->>MC: rewire inbound module refs
    MC->>MC: rewire outbound module refs
    MC->>NW: new_proxy.build()
    MC->>NW: new_proxy.start()
    MC-->>User: new_proxy
Loading

Reviews (1): Last reviewed commit: "feat(modules): add ability to restart mo..." | Re-trigger Greptile

@paul-nechifor paul-nechifor force-pushed the paul/feat/restart-modules branch from a813ca8 to e1cf940 Compare April 7, 2026 00:13
@paul-nechifor paul-nechifor force-pushed the paul/feat/restart-modules branch from e1cf940 to 869648f Compare April 7, 2026 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant